[NVIDIA] Blackwell Family#24673
Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the CMake configuration to support the new NVIDIA Blackwell architecture family, aligning with CUDA 12.9+ features. The changes introduce new architecture codes and update the minimum required CUDA version for Blackwell-specific kernels. While this is a necessary update, I've identified a critical issue with how the new architecture suffixes are handled, which will likely cause build failures. Additionally, there's a potential regression for users on CUDA 12.8 that should be addressed.
|
No ciflow labels are configured for this repo. |
Signed-off-by: Johnny <johnnynuca14@gmail.com>
|
No ciflow labels are configured for this repo. |
|
No ciflow labels are configured for this repo. |
|
Thanks for the PR! Could you please add some more information to the PR description about what this enables, for example:
|
This enable correctly: |
|
@johnnynunez do you want to update cutlass separately? 4.2 tag hasn't been made yet |
hello, cutlass v4.2.0 is out today! |
|
cc @Aidyn-A could you review this PR? |
Aidyn-A
left a comment
There was a problem hiding this comment.
I have left some of the comments. I would like you to test the build, make sure it is 100% successful and double test all the kernels on all machines. I doubt that those CUTLASS kernels need extra arch flags, as CUTLASS kernels tend to be arch specific (e.g. cutlass kernel written for sm_100 can be unusable on sm_120 or sm_110). Please keep in mind that not all arch conditional instructions can be replaced with family conditional.
Additionally, I have not dig into it, but I am pretty sure, the function cuda_archs_loose_intersection needs to be modified for family conditional flags like it does for arch conditional:
Lines 315 to 325 in f4cd80f
Lastly. The regular arch flags and their family conditional counterparts are conflicting with each other. For example:
nvcc code.cu -gencode arch=compute_120,code=sm_120 -gencode arch=compute_120f,code=sm_120fwill end up failing:
nvcc fatal : The same GPU code (`sm_120`) generated for non family-specific and family-specific GPU arch
That being said, here it is important to be very precise on what flags to apply. I would modify cuda_archs_loose_intersection to exclude the basic arch flag if family conditional is being passed.
|
Does |
is strange this code: #if defined CUDA_VERSION i have need to analyze and test. @jasl have a simple code for testing if capavility is active and run ? |
|
@DrStone71 My testing machine is an x86 with RTX Pro 6000 (SM120) I'm not sure this kernel labeled And I have a patch that works for me jasl@b52d720 Because SM120 doesn't contain SM100 features, it's difficult to disable the compilation for non-SM100 platforms |
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
|
is need a check with Cuda 13 and Sm_120 ? i use a physical machine, there is a need of particular environment for testing (sw version or pip sw ?) DrStone71 |
Test MoE, gptoss etc |
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
|
||
| # moe_data.cu is used by all CUTLASS MoE kernels. | ||
| cuda_archs_loose_intersection(CUTLASS_MOE_DATA_ARCHS "9.0a;10.0a" "${CUDA_ARCHS}") | ||
| if(${CMAKE_CUDA_COMPILER_VERSION} VERSION_GREATER_EQUAL 13.0) |
There was a problem hiding this comment.
why is it 13.0 rather than 12.9?
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>
Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>


https://developer.nvidia.com/blog/nvidia-blackwell-and-nvidia-cuda-12-9-introduce-family-specific-architecture-features/
cc @simon-mo